Collective Document Classification with Implicit Inter-document Semantic Relationships
نویسندگان
چکیده
This paper addresses the question of how document classifiers can exploit implicit information about document similarity to improve document classifier accuracy. We infer document similarity using simple n-gram overlap, and demonstrate that this improves overall document classification performance over two datasets. As part of this, we find that collective classification based on simple iterative classifiers outperforms the more complex and computationally-intensive dual classifier approach.
منابع مشابه
Burford, Clint, Steven Bird and Timothy Baldwin (to appear) Collective Document Classification with Implicit Inter-document Semantic Relationships, In Proceedings of *SEM 2015: The Fourth Joint Conference on Lexical and Computational Semantics, Denver, USA
This paper addresses the question of how document classifiers can exploit implicit information about document similarity to improve document classifier accuracy. We infer document similarity using simple n-gram overlap, and demonstrate that this improves overall document classification performance over two datasets. As part of this, we find that collective classification based on simple iterati...
متن کاملCollective document classification using explicit and implicit inter-document relationships
Information systems are transforming the ways in which people generate, store and share information. One consequence of this change is a massive increase in the quantity of digital content the average person needs to deal with. A large part of the information systems challenge is about finding intelligent ways to help users locate and analyse this information. One tool that is available to buil...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملOntologies as Expectations of Term Co-occurrences
Vector space models (VSMs) are often employed as mathematical representations of documents for tasks like indexing, information retrieval (IR), filtering and others, where documents are represented as vectors of their index terms or keywords. In a vector space representation, interrelations between individual terms in the vector are not captured. IR tasks like classification or search rely on t...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015